Controlling Robots in Web Search Engines
نویسندگان
چکیده
Robots are deployed by a Web search engine for collecting information from diierent Web servers in order to maintain the currency of its data base of Web pages. In this paper, we investigate the number of robots to be used by a search engine so as to maximize the currency of the data base without putting an unnecessary load on the network. We use a queueing model to represent the system. The arrivals to the queueing system are Web pages brought by the robots; service corresponds to the indexing of these pages. The objective is to nd the number of robots, and thus the arrival rate of the queueing system, such that the indexing queue is neither starved nor saturated. For this, we consider a nite-buuer queueing system and deene the cost function to be minimized as a weighted sum of the loss rate and the fraction of time that the system is empty. Both static and dynamic policies are considered. In the static setting the number of robots is held xed; in the dynamic setting robots may be reactivated/desactivated at some particular points in time. Under the assumption that arrivals form a Poisson process, and that service times are independent and identically distributed random variables with an exponential distribution, we determine the optimal number of robots to deploy both in the static and in the dynamic setting. Numerical results indicate that substential gains can be achieved by dynamically controlling the activity of the robots.
منابع مشابه
WWW Robots and Search Engines
The Web robots are programs that automatically traverse through networks. Currently, their most visible and familiar application is to provide indices for search engines, such as Lycos and Alta Vista, and semi-automatically maintained topic references or subject directories. In this article, we survey the state-of-art of the Web robots, and the search engines that utilize the results of robot s...
متن کاملImage flip CAPTCHA
The massive and automated access to Web resources through robots has made it essential for Web service providers to make some conclusion about whether the "user" is a human or a robot. A Human Interaction Proof (HIP) like Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) offers a way to make such a distinction. CAPTCHA is a reverse Turing test used by Web serv...
متن کاملA Technique for Improving Web Mining using Enhanced Genetic Algorithm
World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...
متن کاملبررسی واکنش موتورهای کاوش وب به پیشینههای فرادادهای مبتنی برروش ترکیبی دادههای خرد و روش دادههای پیوندی
The purpose of this research was to find out the reaction of Web Search Engines to Metadata records created based on the combined method of Rich Snippets and Linked Data. 200 metadata records in two groups (100 records as the control group with the normal structure and, 100 records created based on microdata and implemented in RDF/XML as experimental group) extracted from the information gatewa...
متن کاملA New Hybrid Method for Web Pages Ranking in Search Engines
There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...
متن کامل